<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="renderer" content="webkit" />
<meta name="force-rendering" content="webkit"/>
<meta name="applicable-device" content="pc,mobile" />
<meta name="MobileOptimized" content="width" />
<meta name="HandheldFriendly" content="true" />
<meta http-equiv="Cache-Control" content="no-transform" />
<meta http-equiv="Cache-Control" content="no-siteapp" />
<meta name="format-detection" content="telephone=no" />
<link rel="shortcut icon" href="/favicon.ico?v=1.7.14" />
<link href="/templets/new/style/common.css?v=1.7.14" rel="stylesheet" />
<title>Python爬虫教程（从入门到精通） - C语言中文网</title>
<meta name="description" content="Python爬虫（Python spider）指的是用 Python 编写的爬虫程序，网络爬虫又称网络蜘蛛。这套Python爬虫教程从入门开始讲解，学习完之后，你能掌握Python爬虫，这套Python爬虫教程适合初学者入门学习，教程中结合了大量的Python爬虫实例，让你将学习的知识融会贯通。" />
</head>
<body>
<div id="header" class="clearfix">
<a id="logo" class="left" href="/">
<img height="26" src="/templets/new/images/logo.png?v=1.7.14" alt="头部Logo" />
</a>
<ul id="nav-main" class="hover-none left clearfix">
<li class="wap-yes"><a href="/">首页</a></li>
<li><a href="/c/">C语言教程</a></li>
<li><a href="/cplus/">C++教程</a></li>
<li><a href="/python/">Python教程</a></li>
<li><a href="/java/">Java教程</a></li>
<li><a href="/linux_tutorial/">Linux入门</a></li>
<li><a href="/sitemap/" title="网站地图">更多&gt;&gt;</a></li>
</ul>
<span id="sidebar-toggle" class="toggle-btn" toggle-target="#sidebar">目录 <span class="iconfont"></span></span>
<a href="http://vip.biancheng.net/?from=topbar" class="user-info iconfont iconfont-user hover-none" target="_blank" title="用户中心"></a>
</div>
<div id="main" class="clearfix">
<div id="sidebar" class="toggle-target">
<div id="contents">
<dt><span class="iconfont iconfont-list-vertical" aria-hidden="true"></span><a href="/python_spider/ ">Python爬虫</a></dt>
<dd>
<span class="channel-num">1</span>
<a href="/python_spider/what-is-spider.html">网络爬虫是什么</a>
</dd>
<dd>
<span class="channel-num">2</span>
<a href="/python_spider/webpage.html">网页构成</a>
</dd>
<dd>
<span class="channel-num">3</span>
<a href="/python_spider/static-and-dynamic.html">静态网页和动态网页</a>
</dd>
<dd>
<span class="channel-num">4</span>
<a href="/python_spider/check-element.html">审查网页元素</a>
</dd>
<dd>
<span class="channel-num">5</span>
<a href="/python_spider/preparatory-work.html">学习前的准备工作</a>
</dd>
<dd>
<span class="channel-num">6</span>
<a href="/python_spider/the-first-spider.html">第一个Python爬虫程序</a>
</dd>
<dd>
<span class="channel-num">7</span>
<a href="/python_spider/user-agent.html">User-Agent用户代理</a>
</dd>
<dd>
<span class="channel-num">8</span>
<a href="/python_spider/useragent-pool.html">User-Agnet代理池</a>
</dd>
<dd>
<span class="channel-num">9</span>
<a href="/python_spider/url-coding.html">URL编码和解码</a>
</dd>
<dd>
<span class="channel-num">10</span>
<a href="/python_spider/crawl-webpage.html">[实例]爬虫抓取网页</a>
</dd>
<dd>
<span class="channel-num">11</span>
<a href="/python_spider/case01.html">[实例]抓取百度贴吧数据</a>
</dd>
<dd>
<span class="channel-num">12</span>
<a href="/python_spider/regexp-syntax.html">正则表达式语法</a>
</dd>
<dd>
<span class="channel-num">13</span>
<a href="/python_spider/re-module.html">Python re模块用法</a>
</dd>
<dd>
<span class="channel-num">14</span>
<a href="/python_spider/csv-module.html">Python csv模块</a>
</dd>
<dd>
<span class="channel-num">15</span>
<a href="/python_spider/case02.html">[实例]抓取猫眼电影排行榜</a>
</dd>
<dd>
<span class="channel-num">16</span>
<a href="/python_spider/pymysql.html">Python Pymysql存储数据</a>
</dd>
<dd>
<span class="channel-num">17</span>
<a href="/python_spider/case03.html">[实例]抓取多级页面数据</a>
</dd>
<dd>
<span class="channel-num">18</span>
<a href="/python_spider/requests.html">Python Requests库</a>
</dd>
<dd>
<span class="channel-num">19</span>
<a href="/python_spider/crawl-photo.html">[实例]抓取网络照片</a>
</dd>
<dd>
<span class="channel-num">20</span>
<a href="/python_spider/requests-args.html">Requests库方法和参数</a>
</dd>
<dd>
<span class="channel-num">21</span>
<a href="/python_spider/switchyomega.html">Proxy SwitchyOmeg</a>
</dd>
<dd>
<span class="channel-num">22</span>
<a href="/python_spider/xpath.html">Xpath简明教程</a>
</dd>
<dd>
<span class="channel-num">23</span>
<a href="/python_spider/xpath-helper.html">Xpath Helper安装使用</a>
</dd>
<dd>
<span class="channel-num">24</span>
<a href="/python_spider/lxml.html">Python lxml库</a>
</dd>
<dd>
<span class="channel-num">25</span>
<a href="/python_spider/lxml-case.html">[实例]Python lxml应用</a>
</dd>
<dd>
<span class="channel-num">26</span>
<a href="/python_spider/case04.html">[实例]抓取链家二手房数据</a>
</dd>
<dd>
<span class="channel-num">27</span>
<a href="/python_spider/capture-package.html">浏览器实现抓包</a>
</dd>
<dd>
<span class="channel-num">28</span>
<a href="/python_spider/case05.html">[实例]破解有道翻译</a>
</dd>
<dd>
<span class="channel-num">29</span>
<a href="/python_spider/case06.html">[实例]抓取动态加载数据</a>
</dd>
<dd>
<span class="channel-num">30</span>
<a href="/python_spider/json.html">Python json模块</a>
</dd>
<dd>
<span class="channel-num">31</span>
<a href="/python_spider/cookie-login.html">[实例]Cookie模拟登录</a>
</dd>
<dd>
<span class="channel-num">32</span>
<a href="/python_spider/multithreading.html">Python多线程爬虫</a>
</dd>
<dd>
<span class="channel-num">33</span>
<a href="/python_spider/bs4.html">Python BS4解析库</a>
</dd>
<dd>
<span class="channel-num">34</span>
<a href="/python_spider/case07.html">[实例]爬虫下载小说</a>
</dd>
<dd>
<span class="channel-num">35</span>
<a href="/python_spider/selenium.html">Selenium下载和安装</a>
</dd>
<dd>
<span class="channel-num">36</span>
<a href="/python_spider/selenium-using.html">Python Selenium用法</a>
</dd>
<dd>
<span class="channel-num">37</span>
<a href="/python_spider/selenium-case.html">[实例]Selenium实战应用</a>
</dd>
<dd>
<span class="channel-num">38</span>
<a href="/python_spider/scrapy.html">Python Scrapy爬虫框架</a>
</dd>
<dd>
<span class="channel-num">39</span>
<a href="/python_spider/scrapy-case.html">[实例]Scrapy框架应用</a>
</dd>
</div>
</div>
<div id="article-wrap">
<div id="article">
<div class="arc-info">
<span class="position"><span class="iconfont iconfont-home2"></span> <a href="/">首页</a> &gt; <a href="/python_spider/">Python爬虫</a></span>
</div>
<h1>Python爬虫教程（从入门到精通）</h1>
<div class="pre-next-page clearfix">&nbsp;</div>
<div id="arc-body"><p class="pb">
<img alt="Python爬虫教程" src="/uploads/allimg/210819/9-210Q916331V55.gif" style="float: left;width: 100px; height: 92px;padding-right:12px;" />网络爬虫（Web Spider）又称&ldquo;网络蜘蛛&rdquo;或&ldquo;网络机器人&rdquo;，它是一种按照一定规则从 Internet 中获取网页内容的程序。广为人知的&ldquo;搜索引擎&rdquo;就是最常见的爬虫程序，比如当我们使用百度引擎搜索关键字时，&ldquo;百度蜘蛛&rdquo;就会根据您输入的关键字去互联网资源中抓取相应的页面。</p>
Python 爬虫指的是用 Python 语言来编写爬虫程序。除了 Python 外，其他语言也可以编写，比如 Java、PHP 等，不过相比较而言，Python 更为简单和实用。一方面， Python 提供了许多可以应用于爬虫的库和模块；另一方面， Python 语法简单、易读，更适合于初学者学习，因此 Python 爬虫几乎成了网络爬虫的代名词。网络爬虫主要用途是采集数据，它是数据分析不可或缺的工具之一。许多公司专门设立了 Python 爬虫工程师岗位，该岗位的职责就是为公司的业务拓展提供数据支持。除此之外，网络爬虫也给我们的生活带来便利，比如抢购火车票、飞机票等。
<h2>
教程特点</h2>
<p class="pb">
本套教程专门为 Python 爬虫的初学者打造，是一套非常不错的入门教程，同时它也适用于数据分析师进阶学习。如您对 Python 爬虫充满兴趣，那么本套教程将非常适合您。</p>
本套教程从最简单的网页分析讲起，并对 Python 网络爬虫常用的请求模块、解析模块做了重点讲解。不仅如此，教程中还介绍了与 Python 爬虫有关的 Selenium 框架和 Scrapy 框架。为了让初学者&ldquo;学到做到&rdquo;，我们采用了&ldquo;知识点讲解+爬虫实例分析&rdquo;相结合的写作方式，降低初学者的学习门槛。通过学习本套教程，您将全面掌握 Python 爬虫的相关知识。
<h2>
阅读条件</h2>
<p>
在学习这套教程前，您应该已经熟练掌握了 Python 基础知识，并对前端语言以及 SQL 数据库有基本掌握。当然，如果您对网络通信协议（TCP/IP 或HTTP）有一定的了解，那么对学习本套 Python 爬虫教程将大有裨益。</p>
<a href="/python_spider/what-is-spider.html" id="click-to-learn">猛击这里开始学习➜</a></div>
<div class="pre-next-page clearfix">&nbsp;</div>
</div>
</div>
</div>
<script type="text/javascript">window.arcIdRaw ='t_' + 421;window.arcId ="9a4fYYon6Ix0Pa3CZSB1nI3Yek6znhkELLEncZ1sNXS+ogmwn518oEIeYQ";window.typeidChain ="421";</script>
<div id="footer" class="clearfix">
<div class="info left">
<p>新手在线学习编程的网站，专注于分享优质精品课程，从零基础到轻进阶，完整、全面、详细。您的下一套教程，何必是书籍。</p>
<p>
<a class="blue" href="/view/8066.html" target="_blank">关于网站</a> <span class="break">|</span>
<a class="blue" href="/view/8093.html" target="_blank">联系我们</a> <span class="break">|</span>
<a class="blue" href="/sitemap/" target="_blank">网站地图</a>
</p>
<p>
<span>Copyright ©2012-2024 biancheng.net</span><span class="break">&nbsp;</span>
<span><img class="beian" src="/templets/new/images/icp.png?v=1.7.14" alt="公安部网络备案" /> ICP备案：<a class="grey" href="https://beian.miit.gov.cn/" target="_blank">冀ICP备2022013920号-4</a></span><span class="break">&nbsp;</span>
<span><img class="beian" src="/templets/new/images/gongan.png?v=1.7.14" alt="公安部网络备案" /> 公安联网备案：<a class="grey" href="https://beian.mps.gov.cn/#/query/webSearch?code=13110202001352" target="_blank">冀公网安备13110202001352号</a></span>
</p>
</div>
<img id="logo_bottom" class="right" src="/templets/new/images/logo_bottom.gif?v=1.7.14" alt="底部Logo" />
<span id="return-top"><b>↑</b></span>
</div>
<script type="text/javascript">window.siteId =4;window.cmsTemplets ="/templets/new";window.cmsTempletsVer ="1.7.14";window.prePageURL ="/python_spider/";window.nextPageURL ="/python_spider/what-is-spider.html";</script>
<script src="/templets/new/script/jquery1.12.4.min.js"></script>
<script src="/templets/new/script/common.js?v=1.7.14"></script>
<span style="display: none;">
<script charset="UTF-8" id="LA_COLLECT" src="//sdk.51.la/js-sdk-pro.min.js"></script>
<script>LA.init({id:"KDf6QzBhogyQjall",ck:"KDf6QzBhogyQjall",autoTrack:true})</script>
</span>
</body>
</html>